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Abstract 

We describe NIMBLE, a system for programming statistical algorithms for general 
model structures within R. NIMBLE is designed to meet three challenges: flexible 
model specification, a language for programming algorithms that can use different 
models, and a balance between high-level programmability and execution efficiency. 

Eor model specification, NIMBLE extends the BUGS language and creates model 
objects, which can manipulate variables, calculate log probability values, generate 
simulations, and query the relationships among variables. Eor algorithm program¬ 
ming, NIMBLE provides functions that operate with model objects using two stages 
of evaluation. The first stage allows specialization of a function to a particular model 
and/or nodes, such as creating a Metropolis-Bastings sampler for a particular block of 
nodes. The second stage allows repeated execution of computations using the results 
of the first stage. To achieve efficient second-stage computation, NIMBLE compiles 
models and functions via C-|—1-, using the Eigen library for linear algebra, and provides 
the user with an interface to compiled objects. The NIMBLE language represents a 
compilable domain-specific language (DSL) embedded within R. This paper provides 
an overview of the design and rationale for NIMBLE along with illustrative exam¬ 
ples including importance sampling, Markov chain Monte Carlo (MCMC) and Monte 
Carlo expectation maximization (MCEM). 

Keywords: domain-specific language; hierarchical models; probabilistic programming; R; 

MCEM; MCMC 
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1 Introduction 

Rapid advances in many statistical application domains are facilitated by compntational 
methods for estimation and inference with cnstomized hierarchical statistical models. These 
inclnde snch diverse fields as ecology and evolntionary biology, edncation, psychology, eco¬ 
nomics, epidemiology, and political science, among others. Althongh each held has differ¬ 
ent contexts, they share the statistical challenges that arise from non-independence among 
data - from spatial, temporal, clnstered or other sonrces of shared variation - that are often 
modeled nsing nnobserved (often nnobservable) random variables in a hierarchical model 
(e.g., Banerjee et aLf 2003 Royle and Dorazio, |2008[ |Cressie and Wikle| |2011|). 


strnctnre 


Advancement of analysis methods for snch models is a major research area, inclnding 


improved performance of Markov chain Monte Carlo (MCMC) algorithms (Brooks et ah 


2011), development of maximnm likelihood methods (e.g., Jacqnier et ah, 2007; Lele et al. 


2010 


de Valpine, 2012), new approximations (e.g., [Rne et al. 2009), methods for mode 


selection and assessment (e.g., Hjort et al., [2006] Gelman et ah, 2014), combinations of 
ideas snch as seqnential Monte Carlo and MCMC (Andrien et ah, 2010), and many others. 


However, the cnrrent state of software for hierarchical models leaves a large gap between the 
limited methods available for easy application and the newer ideas that emerge constantly 
in the statistical literatnre. In this paper we introdnce a new approach to software design 
for programming and sharing snch algorithms for general model strnctnres, implemented 
in the NIMBLE package. 

The key idea of NIMBLE is to combine flexible model specification with a system 
for programming fnnctions that can adapt to model strnctnres. This contrasts with two 
common statistical software designs. In the most common approach, a package provides 
a fairly narrowly constrained family of models together with algorithms cnstomized to 
those models. A fnndamentally different approach has been to provide a langnage for 
model specification, thereby allowing a mnch wider class of models. Of these, the BUGS 


langnage (Gilks et ah, 1994) has been most widely nsed, with dialects implemented in 


WinBUGS, OpenBUGS, and JAGS (Lnnn et ah, 2000, 2012 Plnmmer, 2003). Other tools 


with their own modeling langnage (or similar system) inclnde AD Model Bnilder and its 


newer version. Template Model Bnilder (Fonrnier et al., 2012 Kristensen et al., 2015); Stan 


(2015); BayesX (Belitz et ah, 2013); and PyMG (Path et ah, 2010). All of these packages 
have been snccessfnl for providing specific target algorithms, snch as Laplace approximation 
and specihc kinds of MGMG, bnt none provide a high-level way to write many different 
kinds of algorithms that can nse the flexibly-dehned models. NIMBLE aims to do that via 


a compilable domain specihc langnage (DSL; Elliott et ah, 2003) embedded within R 


The design of NIMBLE nses several approaches that we think are new for statistical 
software. To get started, we needed a general langnage for model specihcation, for which 
we adopted and extended BUGS becanse it has been so widely nsed. NIMBLE processes 
BUGS code into a model object that can be nsed by programs: it can be qneried for variable 
relationships and operated for simnlations or probability calcnlations. R was a natnral ht 
for implementing this idea becanse of its high syntactic compatibility with BUGS and its 
ability to modify and evalnate parsed code as a hrst-class object, owing to its roots in Lisp 
and Scheme (Ihaka and Gentleman, 1996). Second, to allow model-generic programming. 


we needed a way for fnnctions to adapt to different model strnctnres by separating one-time 
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“setup” steps, such as querying a model’s structure, from repeated “run-time” steps, such 
as running a Metropolis-Hastings sampler. This was accomplished by allowing these steps 
to be written separately and using the concepts of specialization and staged evaluation from 
computer science (Taha and Sheard, 1997; Rompf and Odersky, 2010). Third, we needed 


a way to allow high-level programming of algorithms yet achieve efficient computation. 
This was done by creating a compiler to translate the model and run-time functions to 
corresponding C-|--|- code and interfacing to the resulting objects from R. 

NIMBLE includes a domain specihc language (DSL) embedded within R. “Run-time” 
code can be thought of as a subset of R with some special functions for handling models. 
Programming in NIMBLE is a lot like programming in R, but the DSL formally represents a 
distinct language dehned by what is allowed for compilation. NIMBLE stands for Numerical 
Inference for statistical Models using Bayesian and Likelihood Estimation. 

The rest of this paper is organized as follows. First we give an overview of NIMBLE’s 
motivation and design, discussing each major part and how they interact, without spe¬ 
cihc implementation details. Then we give examples of three algorithms - importance 


sampling, MCMC, and Monte Carlo expectation maximization (MCEM; Wei and Tanner 


1990 Levine and Casella, 2001) - operating on one model to illustrate how the pieces ht 


together to provide a hexible system. These examples, and the more complete code avail¬ 
able in the online supplement, provide an introduction to NIMBLE’s implementation. A 
complete user manual is available at the project web site (R-nimble.org). 

2 Overview of NIMBLE 

NIMBLE comprises three main components (Fig. 1): a new implementation of BUGS (with 
extensions) as a model declaration language (Fig. 1:A-C); the nimbleFunction system for 
programming with models (Fig. 1:D-G); and the NIMBLE compiler for model objects and 
nimbleFunctions (Fig. 1:H-J). 

2.1 Design rationale 

The goal of NIMBLE is to make it easier to implement and apply a variety of algorithms 
to any model dehned as a directed acyclic graph (DAG). For example, we might want 
to use (i) several varieties of MGMG to see which is most efficient (Brooks et al., 2011), 


including programmatic exploration of valid MGMG samplers for a particular model; (ii) 
other Monte Garlo methods such as sequential Monte Garlo (SMG, a.k.a. “particle hlters”; 
Doucet et al., 2001) or importance sampling; (iii) modular combinations of methods, such 


as combination of particle hlters and MGMG in state-space time-series models (Andrieu 


et al., 2010) or combination of Laplace approximation and MGMG for diherent levels of the 


model; (iv) algorithms for maximum likelihood estimation such as MGEM and data cloning 
(Lele et ah, 2007 Jacquier et ah, 2007 de Valpine, 2012); (v) methods for model criticism. 


model selection, and estimation of prediction error (Vehtari and Ojanen, 2012) such as 


Bayesian cross-validation (Gelfand et al. 


1992; Stern and Gressie, 2000), calibrated poste¬ 


rior predictive p-values (Hjort et al., 2006) or alternatives to DIG such as WAIG ( Watanabef 
2010 Spiegelhalter et ah, 2014); (vi) “likelihood free” or “plug-and-play” methods such as 


synthetic likelihood (Wood, 2010), approximate Bayesian computation (ABG; Marjoram 
et ah, 2003), or iterated hltering (lonides et ah, 2006); (vii) parametric bootstrapping of 
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A. BUGS code 


D. nimbleFunction definition 

mu ~ dnorm(0, sd = 1000) 


D1. Setup function (U* stage, written in R) 

for(i in 1:3) y[i] ~ dnorm(mu, sd = sigma) 


D2. Run-time function(s) 



(2'^'^ stage, written in NIMBLE DSL) 


B. Model definition 

mu ) 




C. Model object 


C1. Model query functions: 

Determine information about njodel 
variables and nodes 


C2. Interface to modelValues object 
for storing model variables 

mu:: □ logProb_mu: □ 
y: rm logProb_y: fm 


C3. Model operation functions 
(specialized nImbleFunctlons) 


^mu: 


• 

calculate 

• 

simulate 

• 

getLogProb 

• 

\ 

calculateDIff 

y 


y[1]:... 


y[2]:... 


y[3]: 


I 



E. nImbleFunctlon called In R 
executes setup function 



I 


F. Specialized nImbleFunctlon object 

F1. Results from setup function 

F2. Run-time function(s) (NIMBLE DSL) 



Run and debug in 




FI. NIMBLE compiler generates 
and manages C-i-i- 





I. Compiled model object 

11. Model variables 

12. Compiled model 
operation functions 


J. Compiled 
nImbleFunctlon object 

J1. Member data 
J2. Run-time function(s) 


Figure 1: Overview of NIMBLE. Left side: A model starts as BUGS code (A), which 
is turned into a model definition object (B), which creates an uncompiled model ob¬ 
ject (C). Right side: A nimbleFunction starts as model-generic code (D). It is specialized 
to a model and/other arguments by executing its setup function (E), which may inspect 
the model structure (brown arrow, using Cl). This returns an uncompiled, specialized 
nimbleFunction object (F). Its run-time function(s) can be executed in R, using the un¬ 
compiled model (brown arrows), to debug algorithm logic (G). Parts of the model and 
nimbleFunction (red boxes) can be compiled (H), creating objects (I, J) that can be 
used from R similarly to their uncompiled counterparts. Gray = code. Blue = R exe¬ 
cution. Green, purple & tan = Uncompiled objects that run in pure R. Green arrows = 
pre-compilation workflow. Red boxes & arrows = compilation workflow. 
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any of the above ideas; or (viii) the same model and algorithm for mnltiple data sets. These 
are jnst some of many ideas that conld be listed. 

There are several reasons the above kinds of methods have been difhcnlt to handle 
in general software. First, if one has wanted to write a package providing a new general 
method, one has had to “reinvent the wheel” of model specihcation. This means deciding 
on a class of allowed models, writing a system for specifying the models, and writing the 
algorithm to nse that system. Creating model specihcation systems for each package is 
difhcnlt and tangential to the statistical algorithms themselves. It also resnlts in mnltiple 
diherent systems for specifying similar classes of models. For example, lme4 (Bates et al.| 


2014), MCMCglmm (Hadheld, 2010), R-INLA (Martins et ah, 2013), and others each nse a 


diherent system for GLMM specihcation. We desired a system with the hexibility of BUGS 
for declaring a wide range of models, while allowing diherent algorithms to nse the same 
representation of a given model. 

A second limitation of cnrrent designs arises from the tension between expressing al¬ 
gorithms easily in a high-level langnage and obtaining good compntational performance. 
High-level langnages, especially R, can be slow, bnt low-level langnages like C-I--I- reqnire 
mnch greater implementation ehort and cnstomization to diherent problems. A common 
solntion to this problem has been to write compntationally intensive steps in a low-level 
langnage and call them from the high-level langnage. This results in code that is less gen¬ 
eral and less accessible to other developers. Most of the general MCMC packages represent 
an extreme case of this phenomenon, with the algorithms hidden in a “black box” unless 
one digs into the low-level code. We wanted to keep more programming in a high-level 
language and use compilation to achieve efficiency. 


2.2 Specifying models: Extending the BUGS language 

We chose to build upon the BUGS language because it has been widely adopted (Lunn 


et ah 


2004 

Kery and Schaub 

2011; 

Vidakovic, 

2011 

), and domain scientists hnd that it helps 

them to reason clearly about models (iKery and Schaub 

2011 

). Many users of the BUGS 


packages think of BUGS as nearly synonymous with MGMG, but we distinguish BUGS as 
a DSL for model specihcation from its use in MGMG packages. The differences between 
BUGS dialects in JAGS, OpenBUGS, and WinBUGS are not important for this paper. 


2.2.1 BUGS, model definitions, and models 

When NIMBLE processes BUGS code (Fig. 1:A), it extracts all semantic relationships 
in model declarations and builds two primary objects from them. The hrst is a model 
definition object (Fig. 1:B), which includes a representation of graph nodes (also called 
vertices in graph theory) and edges. The second is a model object (Fig. 1:G), which 
contains functions for investigating model structure (Fig. 1:G1), objects to store values of 
model variables (Fig. 1:G2) and sets of functions for model calculations and simulations 
(Fig 1:G3). One model definition can create multiple models with identical structure. 
Normally a user interacts only with the model object, which may use its model definition 
object internally (Fig. 1: brown arrow from G1 to B). 

At this point, it will be useful to introduce several concepts of model definition ob¬ 
jects and model objects designed to accommodate the hexibility of BUGS. Each BUGS 
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declaration creates a node, which may be stochastic or deterministic (“logical” in BUGS). 
For example, node y[3] may be declared to follow a normal distribution with mean mu 
and standard deviation sigma (Fig 1:A). That would make mu and sigma parents of y[3] 
and y[3] a dependent (or dependeney) of mu and sigma. NIMBLE uses variable to refer to 
a possibly multivariate object whose elements represent one or more nodes. For example, 
the variable y includes all nodes declared for one or more elements of y (e.g., y[l] , y[2] , 
y[3]). A node can be multivariate, and such nodes can be occur arbitrarily in contiguous 
scalar elements of a variable. Groups of nodes in a variable may be declared by iteration, 
such that their role in the model follows a pattern, but they may also be declared sepa¬ 
rately, so it cannot be assumed in later processing that they do follow a pattern. The model 
definition uses abstractions for variables, nodes, and their graph relationships that sup¬ 
ports handling of interesting cases. For example, a program may need to determine the 
dependencies of just one element of a multivariate node, even though that element is not 
itself a node. 

Processing BUGS code in a high-level language like R facilitates some natural extensions 
to BUGS. First, NIMBLE makes BUGS extensible by allowing new functions and distri¬ 
butions to be provided as nimbleFunctions. Second, NIMBLE can transform a declared 
graph into different, equivalent graphs that may be needed for different implementation con¬ 
texts. Eor example, NIMBLE implements alternative parameterizations for distributions 
by automatically inserting nodes into the graph to transform from one parameterization to 
another. If the function that ultimately executes gamma probability density calculations 
needs the rate parameter but the BUGS code declares a node to follow a gamma distri¬ 
bution using the scale parameter (related by rate = 1/scale), a new node is inserted to 
calculate 1/scale, which is then used as the needed gamma rate. If any other declaration 
invokes the same reparameterization, it will use the same new node. Another important, 
optional, graph transformation occurs when the parameter of a distribution is an expres¬ 
sion. In that case a separate node can be created for the expression’s value and inserted 
for use where needed. This is useful when an algorithm needs access to the value of a 
parameter for a particular node, such as for conjugate distribution relationships used in 
Gibbs sampling and other contexts. A third extension is that BUGS code can be used to 
dehne a set of alternative models by including conditional statements (i.e., if-then-else) 
that NIMBLE evaluates (in R) when the model definition is created. This avoids the 
need to copy and modify entire BUGS model dehnitions for each alternative model, the 
standard practice when using previous BUGS packages. 

NIMBLE uses a more general concept of data than previous BUGS packages. In previous 
packages, a model cannot be dehned without its data. In NIMBLE, data is a label for the 
role played by certain nodes in a model. For example, nodes labeled as data are excluded 
from calls to simulate new values into the model by default, to avoid over-writing observed 
values, but this default can be over-ridden by a programmer who wishes to simulate fake 
data sets from the model. The data label is distinct from the actual values of nodes labeled 
as data, which can be programmatically changed. For example, one might want to iterate 
over multiple data sets, inserting each one into the data nodes of a model and running an 
algorithm of interest for each. 

At the time of this writing, some BUGS features are not implemented. Most notably. 
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NIMBLE does not yet allow stochastic indexing, i.e., indices that are not constants. 

2.2.2 How model objects are used 

A model object is used in two ways from R and/or nimbleFunctions. First, one may need 
to query node relationships, a common step in setup code (Fig 1: brown arrow from E to 
Cl). For example, consider a nimbleFunction for a Metropolis-Hastings MCMC sampler 
(shown in detail later). In one instance, it may be needed to sample a node called mu [2], 
in another to sample a node called x [3, 5], and so on. We refer to the node to be sampled 
as the target node. The setup stage of the nimbleFunction can query the model object 
to determine what stochastic nodes depend on the target node and save that information 
for repeated use by run-time code. Or it may be that an R function needs to query a 
model object, for example to determine if it conforms to the requirements for a particular 
algorithm. The implementation of the model uses its model definition to respond to 
such queries, but the nimbleFunction programmer is protected from that detail. 

Other examples of model queries include determining: 

• Topologically sorted order of nodes, which means an order valid for sequential calcu¬ 
lations or simulations. 

• All nodes or variables in the model of a particular type, such as stochastic, determin¬ 
istic, and/or data nodes. 

• The position of nodes in the model: e.g., top nodes have no stochastic parents; end 
nodes have no stochastic dependents; and latent nodes have stochastic parents and 
dependents. 

• The nodes contained in an arbitrary subset of variable elements. For example, x [3:5] 
may represent the three scalar nodes x [3], x [4], and x [5], or it may represent one 
scalar node x [3] and one multivariate node x [4:5], or other such combinations. 

• Nodes or expressions with certain semantic relationships, such as the node or expres¬ 
sion for the rate parameter of a gamma distribution. 

• A variety of kinds of dependencies from a set of nodes. For example, stochastic 
dependencies (also called “Markov blankets”) include all paths through the graph 
terminating at, and including, stochastic nodes. These are needed for many algo¬ 
rithms. In other cases, stochastic dependencies without data nodes are needed, such 
as for one time-step of a particle hlter. Deterministic dependencies are like stochastic 
dependencies but omit the stochastic nodes themselves. This kind of dependency is 
useful following the assignment of a value to a node to ensure descendent stochastic 
nodes use updated parameter values. 

The second way model objects are used is to manage node values and calculations, both 
of which are commonly needed in run-time functions (Fig 1:F2). A model object contains 
each model variable and any associated log probabilities (Fig 1:C2). It also can access 
functions for calculating log probabilities and generating simulations for each node (Fig 
1:C3). These functions are constructed as nimbleFunctions from each line of BUGS code. 
Specihcally, each node has a nimbleFunction with four run-time functions: 
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• calculate; For a stochastic node, this calculates the log probability mass or density 
function, stores the result in an element of the corresponding log probability variable 
(Fig. 1: C2), and returns it. For a deterministic node, calculate executes its 
computation, stores the result as the value of the node, and returns 0. 

• calculateDiff; This is like calculate except that for a stochastic node it returns 
the difference between the new log probability value and the previously stored value. 
This is useful for iterative algorithms such as Metropolis-Hastings-based MCMC. 

• simulate; For a stochastic node, this generates a draw from the distribution and 
stores it as the value of the node, simulate has no return value. For a deterministic 
node, simulate is identical to calculate except that it has no return value. 

• getLogProb; For a stochastic node, this returns the currently stored log probability 
value corresponding to the node. For a deterministic node, this returns 0. 

A model object has functions of the same names to call each of these node functions for 
an ordered sequence of nodes. With the exception of simulate, these return the sum of 
the values returned by the corresponding node functions (e.g., the sum of log probabilities 
for calculate). A typical idiom for model-generic programming is to determine a vector 
of nodes by inspecting the model in setup code and then use it for the above operations in 
run-time code. 

2.2.3 modelValues objects for storing multiple sets of model values 

A common need for hierarchical model algorithms is to store multiple sets of values for mul¬ 
tiple model variables, possibly including their associated log probability variables. NIM¬ 
BLE provides a modelValues data structure for this purpose. When a model definition 
is created, it builds a specihcation for the related modelValues class. When a model object 
is created, it includes an object of the modelValues class as a default location for model 
values. New modelValues classes and objects can be created with whatever variables and 
types are needed. Examples of uses of modelValues objects are; storing the output of 
MCMC; storing a set of simulated node values for input to importance sampling; and 
storing a set of “particle” values and associated log probabilities for a particle Liter. 

2.3 Programming with models 

One can use model objects arbitrarily in R, but NIMBLE’s system for model-generic pro¬ 
gramming is based on nimbleFunctions (Fig 1;D). Separate function dehnitions for the two 
evaluation stages - one setup function and one or more run-time functions - are written 
within the nimbleFunction (Fig 1;D1,D2). The purpose of a setup function is to specialize 
a nimbleFunction to a particular model object, nodes, or whatever other arguments are 
taken by the setup function. This typically involves one-time creation of objects that can 
be used repeatedly in run-time code. Such objects could be results from querying the model 
about node relationships, specializations of other nimbleFunctions, new modelValues ob¬ 
jects, or results from arbitrary R code. When a nimbleFunction is called, the arguments 
are passed to the setup function, which is evaluated in R (Fig 1;E). The nimbleFunction 
saves the evaluation environment and creates the return object. The return object is an 
instance of a custom-generated class whose member functions are the run-time function(s) 
(Fig 1;F). 



The two-stage evaluation of nimbleFunctions is similar to a function object (functor) 
system: the nimbleFunction is like an implicit class dehnition, and calling it is like instan¬ 
tiating an object of the class with initialization steps done by the setup function. However 
the nimbleFunction takes care of steps such as determining which objects created during 
setup evaluation need to become member data in a corresponding class dehnition and de¬ 
termining their types from specialized instances of the nimbleFunction. As a result, the 
programmer can focus on higher level logic. 

The run-time functions include a default-named run function and arbitrary others. 
These are written in the NIMBLE DSL, which allows them to be evaluated natively in 
R (Fig 1:G) or compiled into C-|--|- class methods (Fig 1:H). The former allows easier 
debugging of algorithm logic, while the latter allows much faster execution. It is also 
possible to omit the setup function and provide a single run function, which yields a 
simple function in the NIMBLE DSL that can be compiled to C-I--I- but has no hrst-stage 
evalnation and hence no specialization. For both models and nimbleFunctions, the R 
objects that use compiled or uncompiled versions provide a largely identical interface to 
the R user. 

The NIMBLE DSL supports control of model and modelValues objects, common math 
operations, and basic flow control. Control of model objects includes accessing values 
of nodes and variables as well as calling calculate, calculateDiff, simulate, and 
getLogProb for vectors of nodes. With these basic tools, a rnn-time fnnction can op¬ 
erate a model: get or set values, simulate values, and control log probability calculations. 
Use of modelValues objects includes setting and accessing specihc values and copying arbi¬ 
trary gronps of values between model and/or other modelValues objects using the special 
copy operation. Together these uses of modelValues facilitate iteration over sets of valnes 
for use in a model object. For example, a modelValues object might contain the “particle” 
sample of a particle hlter, and the rnn fnnction could iterate over them, using each one 
in the model for some simulation or calculation. Supported math operations include basic 
(vectorized) math, linear algebra, and probability distribution calculations. 

The two-stage evaluation system works natnrally when one nimbleFunction needs to 
use other nimbleFunctions. One nimbleFunction can specialize another nimbleFunction 
in its setup code, or take it as a setup argument, and then use it in rnn-time code. In 
addition one can create vectors of nimbleFunctions. There is a simple nimbleFunction 
class inheritance system that allows labeling of different nimbleFunctions that have the 
same rnn-time function prototype(s). For example, a nimbleFunction for MCMC contains 
a vector of nimbleFunctions, each of which npdates (samples) some subset of the model. 
The latter nimbleFunctions inherit from the same base class. This is a light burden for the 
NIMBLE programmer and allows the NIMBLE compiler to easily generate a simple C-I--I- 
class hierarchy. One can also create numeric objects, lists of same-type numeric objects, 
and customized modelValues objects in setup code for use in run-time code. 

The nimbleFunction system is designed to look and feel like R in many ways, but 
there are important differences. The setup function does not have a programmer-defined 
retnrn valne because the nimbleFunction system takes charge by returning a specialized 
nimbleFunction (ready for run-time fnnction execntion) after calling its setup function. 
More importantly, the run-time fnnction(s) have some highly non-R-like behavior. For 
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efficient C++ performance, they pass arguments by reference, opposite to R’s call-by-value 
semantics. To support the static typing of C++, once an object name is used it cannot 
subsequently be assigned to a different-type object. And type declarations of arguments 
and the return value are required in order to simplify compiler implementation. To a large 
extent, other types are inferred from the code. 


2.4 The NIMBLE compiler 

A thorough description of the NIMBLE compiler is beyond the scope of this paper, but we 
provide a brief overview of how nimbleFunctions and models are mapped to C++ and how 
NIMBLE manages the use of the compiled C++. The NIMBLE compiler generates a C++ 
class dehnition for a nimbleFunction. Results of setup code that are used in run-time 
code are turned into member data. The default-named run member function and other 
explicitly dehned run-time functions are turned into C++ member functions. Once the 
C++ code is generated, NIMBLE calls the C++ compiler and loads the resulting shared 
object into R. Finally, NIMBLE dynamically generates an R reference class dehnition to 
provide an interface (using active bindings) to all member data and functions of objects 
instantiated from compiled C++ (Fig 1:J). This creates an object with identical interface 
(member functions) as its uncompiled counterpart for the R user. When there are multiple 
instances (specializations) of the same nimbleFunction, they are built as multiple objects 
of the same C++ class. If a nimbleFunction is dehned with no setup code, then there is 
no hrst-stage evaluation, and the corresponding C++ is a function rather than a class. 

Compilation of models involves two components. Each line of BUGS code is represented 
as a custom-generated nimbleFunction with calculate, calculateDiff, simulate, and 
getLogProb run-time functions. These are compiled like any other nimbleFunction, the 
only diherence being inheritance from a common base class. This facilitates NIMBLE’s in¬ 
troduction of extensibility for BUGS by allowing new functions and distributions to be pro¬ 
vided as nimbleFunctions. The variables of a model and modelValues are implemented by 
generating simple G++ classes with appropriate member objects. Like nimbleFunctions, 
both models and modelValues objects are automatically interfaced via R objects that have 
similar interfaces to their uncompiled counterparts (Fig 1:1). 

For the most part, the compiler infers types and dimensionality of numeric variables 
and generates code for run-time size-checking and resizing. The exceptions include required 
declaration of run-time argument types and the return type as well as situations where size 
inference is not easy to implement. NIMBLE includes a library of functions and classes 
used in generated G++. Vectorized math and linear algebra are implemented by generating 
code for the Eigen G++ library (Guennebaud, Jacob, et ah, 2010). Basic for-loops for 


numeric iterators and basic flow control using if-then-else and do-while constructs are 
supported. The actual compilation processing converts run-time code into an abstract 
syntax tree (AST) with an associated symbol table, which are annotated and transformed 
into a G++ syntax tree. A set of R classes for representing G++ code was developed for 
this purpose. 

Gompilation of nimbleFunctions harnesses completed hrst-stage evaluation (special¬ 
ization) in several ways. First, contents of objects created during setup evaluation can 
be directly inspected to determine types. Second, the compiler uses partial evaluation to 
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simplify the C++ code and types needed. For example, the compiler resolves nodes in 
model objects at compile time so that the C++ code can hnd the right object by simple 
pointer dereferencing. It also converts vectors of nodes into different kinds of objects de¬ 
pending on how they are nsed in rnn-time code. Snch partial evalnation is done in the 
setup environment, essentially as a compiler-generated extension to the setup code. 


3 Examples 

In this section we present some examples of model-generic programming and the algorithm 
composition it snpports. This section inclndes more implementation details, inclnding 
some code for discnssion. Specihcally, we show how importance sampling and Metropolis- 
Hastings sampling are implemented as nimbleFunctions. Then we show how an MCMC is 
composed of mnltiple samplers that can be modihed programmatically from R. Finally we 
show an example of composing an algorithm that nses MCMC as one component, for which 
we choose MCEM. Complete code to replicate the examples is provided in the snpplement. 

As a model for illnstration of these algorithms, we choose the pnmp model from the 
WinBUGS/OpenBUGS snite of examples (Lnnn et ah, 2012) becanse it is simple to explain 
and nse. We assnme some familiarity with BUGS. The BUGS code is: 


pumpCode <- nimbleCode({ 
for (i in 1:N){ 

theta[i] ~ dgammafshape = alpha, rate = beta) ## random effects 
lambda[i] <- theta[i]*t[i] ## t[i] is explanatory data 

x[i] ~ dpois(lambda[i]) ## x[i] is response data 

} 

alpha ~ dexp(l.O) ## priors for alpha and beta 

beta ~ dgamma(0.1, 1.0) 

» 


Here x and t are to be provided as data (not shown), theta are random effects, and 
alpha and beta are the parameters of interest. We have written the gamma distribntion 
for theta [i] nsing named parameters to illnstrate this extension of BUGS. Greation of a 
model object called pumpModel from the pumpCode is shown in the snpplement. 


3.1 Importance sampling 

Importance sampling is a method for approximating an expected valne from a Monte Garlo 
sample (Givens and Hoeting, 2012). It illnstrates the glaring gap between algorithms 


and software: althongh it is an old and simple idea, it is not easily available for general 
model strnctnres. It involves sampling from one distribntion and weighting each valne so 
the weighted sample represents the distribntion involved in the expected valne. It can 
be nsed to approximate a normalizing constant snch as a likelihood or Bayes factor. (It 
can also be combined with a resampling step to sample from a Bayesian posterior, i.e., 
Sampling/Importance Resampling.) 

For the pnmp model, snppose we want to nse importance sampling to approximate 
the marginal likelihood of x[l:3], which reqnires integrating over the hrst three random 
effects, theta [1:3], given valnes of alpha and beta. This is an arbitrary snbset of the 
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model for illustration. To do so one simulates a sample theta [1:3] ^ ~ Pis(theta[l :3]), 

fc = 1... m, where Pis is a known distribution. For mathematical notation, we are mixing 

fh 

the code’s variable names with subscripts, so that theta [1:3] ^ is the simulated value 
of theta [1:3]. Then the likelihood is approximated as 


P(x[l:3]) ^ 


1 

m 


m 


E 


P(x [1:3] I theta [1:3] fc) 


P(theta[l :3 ]a:) 
Pis(theta[l :3] k) 


( 1 ) 


where P(-) indicates the part of the model’s probability density or mass labeled by its 
argument. The ratio on the right is the importance weight for the value of theta [1:3]. 
To keep the example concise, we assume the programmer already has a function (in R or 
NIMBLE) to sample from Pis and calculate the denominator of the weights. Our example 
shows the use of NIMBLE to calculate ([^ from those inputs. 

Model-generic NIMBLE code for calculation of ([^ is as follows: 

importanceSample <- nimbleFunctionC 

setup = functionCmodel, sampleNodes, mvSample) { 

calculationNodes <- model$getDependencies(sampleNodes) 

}, 

run = function(simulatedLogProbs = double(1)) { 

ans <-0.0 # (1) 

for(k in 1:getsize(mvSample)) { # (2) 

copy(from = mvSample, to = model, # (3) 

nodes = sampleNodes, row = k) 
logProbModel <- model$calculate(calculationNodes) # (4) 

if(!is.nan(logProbModel)) # (5) 

ans <- ans + exp(logProbModel - simulatedLogProbs[k]) 

} 

return(ans/getsize(mvSample)) # (6) 

returnType(double(0)) # (7) 

} 

) 


The specialization step for our example would be pumpIS <- importanceSample (model 
= pumpModel, sampleNodes = "theta[1:3]", mvSample = ISsample). Note that the 
arguments to importanceSample are defined in its setup function, model is given as the 
pumpModel object created above. sampleNodes - the set of nodes over which we want to 
integrate by importance sampling - is provided as a character vector using R’s standard 
variable subset notation. The ISsample object passed as the mvSample argument is a 
modelValues object for providing the values sampled from Pis. It does not need to be 
populated with sample values until the run function is called. Rather, at the setup stage, 
it just binds mvSample to (a reference to) ISsample for use in the run code. 

The only processing done in the setup code is to query the model for the vector of 
ordered stochastic dependencies of the sampleNodes. These are needed in the run code 
to calculate the necessary part of the model in topologically sorted order. The model is 
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queried using getDependencies, and the result saved in calculationNodes. In this case, 
calculationNodes will turn out to be (theta[1] , theta [2] , theta [3] , lambda [1] , 
lambda[2] , lambda[3] , x[l] , x[2], x[3]). This means that model$calculate(calculationNodes) 
will return log (-P(x [1:3] |theta[l: 3] fc)P(theta[l: 3] k))- 

The run code illustrates several features of the NIMBLE DSL. It shows type declaration 
of the simulatedLogProbs argument as a vector of doubles (double-precision numbers) 
and (7) the return type as a scalar double. simulatedLogProbs represents the vector of 
Pis(theta [1:3] k) values. In another implementation of importance sampling, this could be 
included in the mvSample object, but we use it here to illustrate a run-time argument. The 
body of the run function (1) initializes the answer to zero; (2) iterates over the samples 
in mvSample; (3) copies values of the sampleNodes from mvSample into the model; (4) 
calculates the sum of log probabilities of calculationNodes; and (5) uses basic if-then 
logic and math to accumulate the results. 

The most important insight about importanceSample is that it is model-generic: noth¬ 
ing in the setup code or run code is specihc to the pump model or nodes theta [1:3]. 

3.2 Metropolis-Hastings samplers 

Next we illustrate a Metropolis-Hastings sampler with a normally-distributed random-walk 
proposal distribution. The model-generic code for this is: 

simple_MH <- nimbleFunction( 

setup = function(model, currentState, targetNode) { 
calculationNodes <- model$getDependencies(targetNode) 

}, 

run = function(scale = double(0)) { 

logProb_current <- model$getLogProb(calculationNodes) # (1) 

proposalValue <- rnorm(l, mean = model[[targetNode]], sd = scale) # (2) 

model [ [targetNode] ] «- proposalValue # (3) 

logProb_proposal <- model$calculate(calculationNodes) # (4) 

log_Metropolis_Hastings_ratio <- logProb_proposal - logProb_current # (5) 

accept <- decide(log_Metropolis_Hastings_ratio) # (6) 

if(accept) 

copy(from = model, to = currentState, row =1, # (7a) 

nodes = calculationNodes, logProb = TRUE) 

else 

copy(from = currentState, to = model, row =1, # (7b) 

nodes = calculationNodes, logProb = TRUE) 
return(accept) 
returnType(integer(0)) 

» 

Suppose we want a sampler for theta[4] in the pump model. An example specializa¬ 
tion step would be theta4sampler <- simple_MH(model = pumpModel, currentState 
= mvState, targetNode = "theta[4]"). Here mvState is a modelValues object with 
variables that match those in the model, with only one of each. This is used to store the 
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current state of the model. We assume that on entry to the run function, mvState will 
contain a copy of all model variables and log probabilities, and on exit the run function 
must ensure that the same is true, reflecting any updates to those states. As in the im¬ 
portance sampling example, the only real work to be done in the setup function is to 
query the model to determine the stochastic dependencies of the targetNode. In this case 
calculationNodes will be (theta [4] , lambda [4] , x[4]). 

The run function illustrates the compactness of expressing a Metropolis-Hastings algo¬ 
rithm using language elements like calculate, getLogProb, copy, and list-like access to 
a model node. The scale run-time argument is the standard deviation for the normally 
distributed proposal value. In the full, released version of this algorithm (sampler_RW), 
the setup code includes some error trapping, and there is additional code to implement 
adaptation of the scale parameter ( |Haario et al. , 2001) rather than taking it as a run¬ 
time argument. The simplihed version here is less cluttered for illustration. In addition 
the full version is more efficient by using calculateDif f instead of both getLogProb and 
calculate, but here we use the latter to illustrate the steps more clearly. 

The lines of run (1) obtain the current sum of log probabilities of the stochastic depen¬ 
dents of the target node (including itself); (2) simulate a new value centered on the current 
value (model [ [targetNode] ]); (3) put that value in the model; (4) calculate the new sum 
of log probabilities of the same stochastic dependents; (5) determine the log acceptance 
probability; (6) call the utility function decide that determines the accept/reject decision; 
and (7) copy from the model to the currentState for (7a) an acceptance or (7b) vice-versa 
for a rejection. Again, the setup and run functions are fully model-generic. 

This example illustrates natural R-like access to nodes and variables in models, such 
as model [ [targetNode] ], but making this model-generic leads to some surprising syntax. 
Every node has a unique character name that includes indices, such as "theta[4]". This 
leads to the syntax model [ ["theta[4] "] ], rather than model [ ["theta"] ] [4]. The latter 
is also valid, but it is not model-generic because, in another specialization of simple_MH, 
targetNode may have a different number of indices. For example, if targetNode is "y[2, 
3]", model [ [targetNode] ] accesses the same value as model [ ["y"] ] [2,3]. The NIMBLE 
DSL also provides vectorized access to groups of nodes and/or variables. 


3.3 MCMC 

To illustrate a full set of MCMC samplers for a model, we do not provide nimbleFunction 
code as above but rather illustrate the flexibility provided by managing sampler choices 
from R. The hrst step in creating an MCMC is to inspect the model structure to decide 
what kind of sampler should be used for each node or block of nodes. An R function 
(conf igureMCMC) does this and returns an object with sampler assignments, which can 
be modihed before creating the nimbleFunctions to execute the MCMC. Since this is all 
written in R, one can control its behavior, modify the code, or write a completely new 
MCMC system. Once the user is happy with the MCMC configuration, the corresponding 
suite of specialized nimbleFunctions can be built, compiled, and executed. 

In the case of the pump model (see supplement), we choose for illustration to start with 
normal adaptive random walk samplers rather than Gibbs samplers. It is apparent from 
Figure 2 (left panel) that the posterior is correlated between alpha and beta. One might 
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then customize the sampler choices using this knowledge. For example, one can insert 
a bivariate (block) adaptive random walk sampler and then re-compile the MCMC. This 
results in improved mixing, reflected as lower autocorrelations of the chain (Fig. 2, middle 
panel) and higher effective sample size per second of computation (Fig. 2, right panel). 



Figure 2: Example of how high-level programmability and compilation allow flexible 
composition of efficient algorithms. This uses the “pump” model from the classic BUGS 
examples. Left panel: Parameters a and {3 show posterior correlation. Middle panel: 
MCMC mixing is summarized using the estimated autocorrelation function. When a bi¬ 
variate (block) adaptive random walk sampler is added to the suite of univariate adaptive 
random walk samplers, the chain autocorrelation decreases, reflecting better mixing. Right 
panel: Computational performance measured as the effective sample size per second of 
computation time is greater with the block sampler included. 


3.4 Monte Carlo Expectation Maximization 

MCEM is a widely known algorithm for maximum likelihood estimation for hierarchical 
models. It is used instead of the EM algorithm when the “expectation” step cannot be 
determined analytically. To our knowledge, there has been no previous implementation of 
MCEM that can automatically be applied to the range of model structures provided by 
BUGS. MCEM works by iterating over two steps: (1) MCMC sampling of the latent states 
given fixed parameters (top-level nodes); and (2) optimization with respect to (non-latent) 
parameters of the average log probability of the MCMC sample. NIMBLE provides a 
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buildMCEM function in which step (1) is implemented by creating an MCMC conhguration 
with samplers only for latent states, and step (2) is implemented by calling one of R’s 
optimizers with a compiled nimbleFunction as the objective function. The top level of 
control of the algorithm is an R function that alternates between these steps. For the pump 
model, the MCEM quickly settled within 0.01 of the published values of 0.82 and 1.26 for 


alpha and beta (George et ah, 1993), which we consider to be within Monte Carlo error. 


4 Discussion 

We have introduced a system for combining a flexible model specihcation language with 
a high-level algorithm language for model-generic programming, all embedded within R. 
Numerous other algorithms can be envisioned for implementation with this system, such 


as those listed in section (2.1) above 


However, several important challenges remain for building out the potential of NIM¬ 
BLE. First, not all features of BUGS, or of graphical models in general, have so far been 
incorporated. A particular challenge is efficient handling of stochastically indexed depen¬ 
dencies, such as when discrete mixture components are latent states. This represents a 
dynamic graph structure and so will require a more flexible system for representing depen¬ 
dencies. Second, several packages have made great use of automatic differentiation, notably 
ADMB/TMB and Stan. Because the NIMBLE compiler generates G-I--I- code, it would be 
possible to extend it to generate code that uses an automatic differentiation library. Third, 
there is a need to include more compilable functionality in the NIMBLE DSL, such as use 
of R’s optimization library from generated G-I--I-. An algorithm like Laplace approximation 
would be most natural if optimization and derivatives are available in the DSL. Finally, 
there is potential to extend the NIMBLE compiler in its own right as a useful tool for 
programming efficient computations from R even when there is no BUGS code involved. 

The choice to embed a compilable domain-specific language within R revealed some 
benefits and limitations. R’s handling of code as an object facilitates processing of BUGS 
models and nimbleFunction code. It also allows the dynamic construction and evaluation 
of class-definition code for each model and nimbleFunction and their G-|--|- interfaces. 
And it provides many other benefits, perhaps most importantly that it allows NIMBLE to 
work within such a popular statistical programming environment. On the negative side, 
NIMBLE needs some fundamentally different behavior than R, such as call-by-reference 
and functions that work by “side effects” (e.g., modifying an object without copying it). 
Such inconsistencies make NIMBLE something of a conceptual hybrid, which could be 
viewed as practical and effective by some or as inelegant or confusing by others. And for 
large models, NIMBLE’s compilation processing suffers from R’s slow execution. 

We built upon BUGS as a model specification language because it has become so widely 
used, but it has limitations. First, BUGS uses distribution notation slightly different from 
R, so combining BUGS and R syntaxes in the same system could be confusing. In particular 
some BUGS distributions use different default parameterizations than R’s distributions of 
the same or similar name. Second, BUGS does not support modular model programming, 
such as compactly declaring common model substructures in a way that re-uses existing 
code. It also does not support vectorized declarations of scalar nodes that follow the 
same pattern (it requires for-loops instead). These are extensions that could be built 
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into NIMBLE in the future. Other extensions, such as declaration of single multivariate 
nodes for vectorized calculations, were implemented almost automatically as a result of 
NIMBLE’s design. Third, one could envision powerful uses of programmatically generating 
model dehnitions rather than writing them in static code. This could be done via NIMBLE’s 
model dehnition system in the future. 

Other quite distinct lines of research on software for graphical models come from “prob¬ 
abilistic programming” efforts by computer scientists, such as Church (Goodman et ah 


2008) and BLOG (Milch et al., 2006). Their motivations are somewhat different, and their 
programming style and concepts would be new to many applied statisticians. It will be 
interesting to see where these two distinct motivations for similar programming language 
problems lead in the future. 
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